# FP8 Quantization

Qwen2.5 VL 32B Instruct FP8 Dynamic
Apache-2.0
An FP8 quantized version based on the Qwen2.5-VL-32B-Instruct model, supporting visual-text input and text output, suitable for efficient inference scenarios.
Image-to-Text Transformers English
Q
BCCard
140
1
Qwen3 235B A22B FP8 Dynamic
Apache-2.0
The FP8 quantized version of the Qwen3-235B-A22B model, which effectively reduces GPU memory requirements and improves computational throughput, suitable for various natural language processing scenarios.
Large Language Model Transformers
Q
RedHatAI
2,198
2
Qwen3 30B A3B FP8
Apache-2.0
Qwen3 is the latest generation of large language models in the Tongyi Qianwen series, offering a complete suite of dense and mixture-of-experts (MoE) models. Through large-scale training, Qwen3 has achieved breakthroughs in reasoning, instruction following, agent capabilities, and multilingual support.
Large Language Model Transformers
Q
Qwen
107.85k
57
Gemma 3 27b It FP8 Dynamic
Apache-2.0
This is a quantized version of google/gemma-3-27b-it. The weights are quantized using the FP8 data type. It is suitable for visual-text input and text output, and can perform inference with efficient deployment using vLLM.
Image-to-Text Transformers English
G
RedHatAI
1,608
1
Qwen3 0.6B FP8
Apache-2.0
Qwen3-0.6B-FP8 is the latest version in the Tongyi Qianwen series of large language models, offering a 0.6B-parameter FP8 quantized version that supports free switching of mind modes and multilingual tasks.
Large Language Model Transformers
Q
Qwen
5,576
43
FLUX.1 Dev ControlNet Union Pro 2.0 Fp8
Other
This is the FP8 quantized version of the Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0 model, quantized from the original BFloat16 format using PyTorch's native FP8 support to optimize inference performance.
Image Generation English
F
ABDALLALSWAITI
2,023
15
Qwen2.5 VL 7B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-7B-Instruct, supporting efficient vision-text inference through vLLM
Text-to-Image Transformers English
Q
RedHatAI
25.18k
1
Qwen2.5 VL 3B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.
Text-to-Image Transformers English
Q
RedHatAI
112
1
Deepseek R1 Distill Llama 70B FP8 Dynamic
MIT
The FP8 quantized version of DeepSeek-R1-Distill-Llama-70B, which optimizes inference performance by reducing the number of bits of weights and activations.
Large Language Model Transformers
D
RedHatAI
45.77k
9
Bamba 9B V1
Apache-2.0
Bamba-9B is a decoder-only language model based on the Mamba-2 architecture, trained in two stages, excelling in a wide range of text generation tasks.
Large Language Model
B
ibm-ai-platform
16.19k
35
Pixtral 12b FP8 Dynamic
Apache-2.0
pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.
Text-to-Image Safetensors Supports Multiple Languages
P
RedHatAI
87.31k
9
Llama 3.2 11B Vision Instruct FP8 Dynamic
This is a quantized model based on Llama-3.2-11B-Vision-Instruct, suitable for multilingual business and research purposes, and can be used in chat scenarios similar to assistants.
Image-to-Text Safetensors Supports Multiple Languages
L
RedHatAI
2,295
23
Deepseek Coder V2 Lite Instruct FP8
Other
FP8 quantized version of DeepSeek-Coder-V2-Lite-Instruct, suitable for commercial and research use in English, optimized for inference efficiency.
Large Language Model Transformers
D
RedHatAI
11.29k
7
Meta Llama 3 70B Instruct FP8
Meta-Llama-3-70B-Instruct-FP8 is a quantized version of Meta-Llama-3-70B-Instruct. It reduces disk size and GPU memory requirements through FP8 quantization while maintaining high performance. It is suitable for English business and research purposes.
Large Language Model Transformers English
M
RedHatAI
22.10k
13
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase